07. Spark Broadcast

What is Spark Broadcast?

What is Spark Broadcast?

Spark Broadcast variables are secured, read-only variables that get distributed and cached to worker nodes. This is helpful to Spark because when the driver sends packets of information to worker nodes, it sends the data and tasks attached together which could be a little heavier on the network side. Broadcast variables seek to reduce network overhead and to reduce communications. Spark Broadcast variables are used only with Spark Context.

When is broadcast usually used in Spark?

When is broadcast usually used in Spark?

SOLUTION:
  • Broadcast join is a way of joining a large table and small table in Spark.
  • Broadcast join is like map-side join in MapReduce.

Exercise: Broadcast Example

Run the starter code in Jupyter Notebook to practice Broadcast Joins.